[00:04.150 --> 00:08.290]  All right. Hello, everybody. My name is Master Chen, and you are watching or listening to
[00:08.290 --> 00:14.830]  Twitter Word Frequency. Welcome to my talk. All right. So, let's start with who I am,
[00:14.830 --> 00:19.630]  or who am I. I'm a prior B-Side speaker, where I spoke about being a con man and
[00:20.210 --> 00:27.950]  working in Vegas surveillance. I spoke at DEF CON on VoIP security, as well as DC Sky Talks
[00:27.950 --> 00:31.710]  on automate your stocking, which we'll kind of cover a little bit about today.
[00:32.930 --> 00:37.690]  And most recently, the Recon Village in the last few years, where we talked about
[00:37.690 --> 00:42.170]  stalker in a haystack, URL shortened by any other name, which was last year,
[00:42.170 --> 00:47.110]  and of course, this year and this talk. It's virtual now.
[00:48.370 --> 00:55.470]  You can find me at ChenBox. That is my handle on most social media. So, feel free to follow.
[00:56.730 --> 01:00.170]  All right. So, let's get into this, and let's start from
[01:00.170 --> 01:06.090]  where we are at this point in time. And I think that'll help us understand
[01:07.090 --> 01:14.730]  where my mindset is when it comes to this talk. So, of course, I started with automate your
[01:14.730 --> 01:20.330]  stocking, where I was digging into my Twitter research. This is bad. And we learned that
[01:20.330 --> 01:26.650]  during the DC Sky Talks a couple of years ago. So, the following year, I did stalker in a haystack,
[01:26.650 --> 01:32.910]  where we figure out how to not be stalked, or at least we can detect cyber stalking.
[01:33.610 --> 01:40.690]  That was good. And now I've gotten recently into political sports betting, or, you know,
[01:40.690 --> 01:47.970]  market speculation, which is part of the story. And profiling. I mean, profiling like social media
[01:47.970 --> 01:54.570]  profiling on different accounts. And of course, I do have a degree in psychology. So, all this
[01:54.570 --> 02:01.010]  wraps up into where we are at this point. So, these five pieces are what kind of has
[02:02.130 --> 02:10.120]  gotten around this research here. So, about this talk. I'm gonna start with
[02:10.120 --> 02:14.760]  the original goal and how I came up with what I was researching at that moment.
[02:15.460 --> 02:23.180]  Why that failed and why we move on to what it is currently now as I'm presenting this.
[02:23.180 --> 02:27.020]  And what I would like my current research to be in the future.
[02:28.720 --> 02:34.440]  So, before we get into all of that, though, let's talk about caveats and disclaimers and warnings.
[02:34.560 --> 02:43.480]  Oh, my. I'm bad at that. Anyway, first caveat or first disclaimer is for those who have followed
[02:43.480 --> 02:48.820]  my work before, you know that I've said I anal or I anus, which is I am not a lawyer and I am
[02:48.820 --> 02:53.740]  not a stalker. I'm gonna add something new to that, because I'm very new to the whole data
[02:53.740 --> 03:01.640]  science thing. I've only recently gotten into using Pandas and Scikit-learn and NumPy and all
[03:01.640 --> 03:10.000]  these different data analytics tools. So, I've added INADs. I am not a data scientist. I'm not
[03:10.120 --> 03:15.280]  a lawyer, I'm not a stalker, and I am definitely not a data scientist, but I'll try my best.
[03:16.240 --> 03:22.180]  So, this project is a work in progress, and I'm hoping that after I explain kind of what we're
[03:22.180 --> 03:26.840]  doing here today, that maybe I can get some help from the community to further this project,
[03:26.840 --> 03:32.280]  or at least now that I've defined the basics, I can continue and make this something very strong.
[03:33.660 --> 03:40.600]  This is a neutral tool. I'm not building what I'm building today for stalking purposes,
[03:40.600 --> 03:44.460]  you know, this is just a tool. You use it how you want, but of course,
[03:44.460 --> 03:50.140]  as always, I'm not going to be held responsible for what you do with my research.
[03:51.740 --> 03:56.920]  And I did not stand on the shoulders of giants for this research. What I mean by that is,
[03:56.920 --> 04:00.360]  I know that there's a lot of research already in the space when it comes to
[04:00.360 --> 04:07.820]  frequency analysis, word frequency analysis, you know, sentiment analysis, NLP,
[04:07.820 --> 04:11.840]  not neuro-linguistic programming, but natural language processing.
[04:12.480 --> 04:19.440]  I know that there's all that research out there, and I barely used any of it. So,
[04:20.140 --> 04:23.940]  I would like to integrate that later, but I just want to let you know that for this
[04:23.940 --> 04:31.220]  particular talk, that was really not used. So, it is definitely a work in progress.
[04:32.840 --> 04:38.860]  All right. So, again, for those of you who know me, I'm not a sports guy. I've only recently
[04:38.860 --> 04:43.040]  gotten into hockey, and I've always been a UFC fan, but other than that, I'm not a sports person.
[04:43.040 --> 04:48.560]  So, I've never been into the sports book, but one thing I know about being in Vegas for my
[04:48.560 --> 04:54.220]  entire life is that there's a sports book for every market. It doesn't matter what the market
[04:54.220 --> 05:00.180]  or what the speculation is, there's a sports book for it or a betting market for it. And that's
[05:00.180 --> 05:06.340]  where we start our story. So, if anybody's ever heard of predicted.org, it's a market for
[05:06.340 --> 05:11.040]  political sports betting. So, you can bet on who you think the next president's going to be,
[05:11.040 --> 05:17.200]  not just in America, but globally. Pick a country, pick a state, pick a, you know,
[05:17.200 --> 05:23.540]  impeachment hearings. Who will run the White House? Who's going to be the VP pick? All of
[05:23.540 --> 05:28.520]  these different markets are on predicted.org. And so, you can make a fair amount of money
[05:28.520 --> 05:33.600]  if you know what you're doing, or you can just have a casual bet and just be right with a few
[05:33.600 --> 05:39.640]  dollars and cents. And so, and specifically, we're going to be talking about President Donald
[05:39.640 --> 05:47.940]  Trump's tweets as that is kind of a famous thing here and how it pertains to the political sports
[05:47.940 --> 05:53.760]  betting market. Okay. And well, let's get into that now, because right now we're talking about
[05:53.760 --> 05:59.640]  piecount.com, which is a tweet counter that's out there on the internet. And it keeps track
[05:59.640 --> 06:04.820]  on a weekly basis of the president's tweets, as well as the VP, the White House, and the POTUS
[06:04.820 --> 06:13.040]  accounts. And this is all geared towards the tweet market that predicted offers or used to offer,
[06:13.040 --> 06:20.580]  and we'll get into that in a second. But we're talking about Twitter, you know, Twitter markets.
[06:20.580 --> 06:27.300]  Again, this is on a weekly basis, the question or the market is, how many tweets will the president
[06:27.300 --> 06:31.980]  tweet out for this particular week, starting from Wednesday of this week and going on to
[06:32.620 --> 06:38.500]  Wednesday of the next week, it could be anywhere from, you know, 50, which is, which is highly
[06:38.500 --> 06:43.780]  unlikely, all the way up to, you know, 300, 400, you know, and, and that's, that's what I was
[06:43.780 --> 06:51.540]  analyzing beforehand. Now, I wanted to know if the profits could be based off of analytics. And,
[06:51.540 --> 06:58.280]  you know, could the, could I anticipate what he would be tweeting based off of current events,
[06:58.280 --> 07:05.100]  news, drama, anything that's out there in the global space. And the goal, the goal with this
[07:05.100 --> 07:11.380]  original research was to get the president to pay for my energy bill. So if I made enough money
[07:11.380 --> 07:16.520]  on this predicted.org market, could I pay the energy, my energy bill at the house
[07:17.120 --> 07:21.400]  with, with Donald Trump's tweets, that's, that's where I was going with this whole thing.
[07:23.160 --> 07:30.340]  So again, prior research is the frequency of the tweets, how often he would tweet in that week. And,
[07:30.340 --> 07:41.360]  and I was capturing, as you'll see here, in the CSV file on the right side of the screen here,
[07:41.840 --> 07:51.200]  well, we see 68 in the week of, of April of 2019. We see 137, 140, 141. So these are all
[07:52.360 --> 07:58.080]  a rolling tally or running tally of, of his tweets. And of course, you graph it using,
[07:58.080 --> 08:03.880]  using Jupyter, a Jupyter notebook, you know, you, you graph it and see the frequency here.
[08:03.880 --> 08:10.160]  And then you ask yourself, you know, was this peak here due to impeachment drama? Was this due
[08:10.160 --> 08:17.040]  to some sort of scandal or emergency in the Middle East? What caused these peaks of tweets?
[08:17.040 --> 08:21.440]  Was it just somebody getting on the president's nerves? These were all the questions that I was
[08:21.440 --> 08:27.300]  asking at the time, but there's a problem. Predicted.org actually stopped their, their
[08:27.300 --> 08:35.920]  tweet markets. So while, while Predicted is still around, they no longer offer you the ability to,
[08:35.920 --> 08:41.600]  to bet on, on, on tweets. And so all of this research has kind of just stopped, you know,
[08:41.600 --> 08:45.600]  so they, they changed their offerings. They, they started to offer other markets that
[08:46.160 --> 08:49.720]  just weren't working fast enough. It, you know, these other markets aren't
[08:49.720 --> 08:54.180]  resolving on a weekly basis. And so my profits slowed to a halt.
[08:55.200 --> 08:59.720]  And so because of that, this whole project that I was working on, as far as tweet frequency,
[09:00.700 --> 09:05.680]  it, it kind of took a back, a back burner and I, I kind of forgot about it for a couple of months.
[09:06.440 --> 09:14.020]  Um, but, um, it got resurrected. And so, uh, the project lives and, and I'll tell you why.
[09:14.280 --> 09:23.080]  So I tweet on a regular basis and I tweet very vague, vague things. Um, and, and that's on purpose,
[09:23.080 --> 09:28.640]  I would say. Um, it could be a Zen of the day quote. It could be, uh, something obscure or
[09:28.640 --> 09:35.900]  just something that's on my mind, right? Because that's exactly what Twitter is used for. You,
[09:35.900 --> 09:40.960]  and then you just let it out there. It's not always political. It's not always socially driven.
[09:40.960 --> 09:47.780]  It's just a thought that's on your mind. Well, I had one of my followers, um, kind of get a little
[09:47.780 --> 09:52.960]  bit offended or had a little bit of ruffled feathers here, um, with one of my tweets.
[09:53.320 --> 10:01.720]  And so, uh, he kind of got mad at me and, you know, I wasn't being political, but the
[10:01.720 --> 10:06.360]  vagueness of my tweet allowed, uh, this particular person to,
[10:06.360 --> 10:14.060]  uh, project their political ideals on, on my tweet. So, um, I thought about that for a while
[10:14.060 --> 10:18.440]  and I could have gotten mad and, and, and, you know, gotten back and forth with him,
[10:18.440 --> 10:24.000]  um, but I didn't. And instead, like, I don't get mad, but I will turn you into research.
[10:24.220 --> 10:30.540]  And, and that's what's bringing us here today. So again, the target projected their beliefs on me
[10:30.540 --> 10:36.220]  and the meaning of my tweet, even though there was no correlation there, there was nothing to,
[10:36.220 --> 10:44.280]  to draw conclusions from. Okay. And so blindsided by rage, meaning that I had no idea that I,
[10:44.280 --> 10:49.880]  that this was going to be happening. Um, but I asked myself, could I have seen this coming?
[10:50.100 --> 10:57.400]  Could I have predicted that maybe what I tweeted, um, would result in a certain outrage from not
[10:57.400 --> 11:04.580]  just a certain demographic, but from maybe specific people. And this is still the work in
[11:04.580 --> 11:12.740]  progress. So, um, who are we targeting, um, with, with what we're working with today? Well, as,
[11:12.740 --> 11:17.120]  as stated in the last slide, we're working with political opponents or, uh, we're talking about
[11:17.120 --> 11:21.820]  political opponents and not because we're trying to be political, but just because, um, this is a
[11:21.820 --> 11:27.240]  use case, uh, for, for what we're doing here. This is a scenario. Um, corporate marks, we could
[11:27.240 --> 11:32.260]  use what we're going to do today for, you know, spear phishing or, uh, maybe password lists or
[11:32.260 --> 11:40.440]  some sort of targeted, um, execution of, of an attack. Um, cyber bullies, you know, there are
[11:40.440 --> 11:46.080]  plenty out there and wouldn't you want a leg up if you knew the type of vocabulary that they were
[11:46.080 --> 11:52.380]  using, uh, the, uh, way in which they tweet, the sentiment behind their tweets, and maybe you could,
[11:52.380 --> 11:57.800]  uh, you know, work for a better outcome, uh, against your cyber bullies because yes, they are
[11:57.800 --> 12:05.160]  definitely out there. Now, sadly, I have to address this. Um, this also has some stalker interests.
[12:05.160 --> 12:10.980]  Um, but remember what I said before, and in my previous research, stalker interest, this is bad.
[12:10.980 --> 12:17.440]  It can be used for stalking. I do not condone it. I, I think we've, we've crossed that already. Yes.
[12:18.060 --> 12:24.820]  Okay. So what are we talking about? Well, everybody knows Twitter is a gold mine for OSINT and recon.
[12:24.820 --> 12:31.440]  Okay. Um, and, and, and it's not a secret. It's, it's very well known and it's, it's that way by
[12:31.440 --> 12:36.980]  design. Um, not everybody has a private account and, and it kind of ruins the fun if you are on
[12:37.080 --> 12:44.000]  a private account. So, um, Twitter, we know as a publicly shared sentiment. If you're going to put
[12:44.000 --> 12:49.820]  it on Twitter, you have to expect that, uh, it can be analyzed. It can be scraped. It could be mined.
[12:49.820 --> 12:57.900]  It could be, um, used for research such as today's research. So what we're doing though is we're
[12:57.900 --> 13:05.500]  making an at-a-glance picture of what this profile, a target profile, um, is about, what it's all about.
[13:05.500 --> 13:12.860]  So, um, you know, the term at-a-glance, what I'm talking about here is instead of, instead of
[13:12.860 --> 13:17.640]  having to scroll through an entire timeline to understand what this particular profile is all
[13:17.640 --> 13:23.820]  about, what if we could just take a look at a dashboard or, or just one screen and understand
[13:23.820 --> 13:29.600]  the sentiment of that, uh, Twitter profile? That's, that's something that, uh, I think would be very
[13:29.600 --> 13:36.860]  valuable. Why? Why are we doing this? Well, um, because I can, because we can. We're hackers, uh,
[13:36.860 --> 13:41.020]  and, and this is what we do. Um, we look at something, we look at a problem, we look at
[13:41.020 --> 13:46.040]  maybe a scenario that has recently happened to us, and we say, okay, I know how to fix that, or
[13:46.040 --> 13:51.840]  I know what I can do about that. So yes, because I can, because we can. Remember, we're hackers.
[13:53.500 --> 13:58.280]  All right, and we also want to do this because, um, this gives us an insight into tendency of
[13:58.280 --> 14:05.340]  the profile. Okay. Uh, again, and the goal is here to, to find maybe quicker red flags, maybe catch
[14:05.340 --> 14:13.120]  that outrage, um, before, uh, it bubbles up and, and, and blindsides us. Okay. And of course, uh, as with
[14:13.120 --> 14:17.340]  anything, we could use this for later weaponization. Uh, we could turn this into something that could
[14:17.340 --> 14:23.300]  be an offensive tool if needed, of course. Uh, we're talking about password lists, uh, further social
[14:23.300 --> 14:29.400]  profiling, uh, maybe psychological profiling, uh, and troll bots, you know, um, maybe they could
[14:29.400 --> 14:34.920]  have a conversation with themselves. Now, how are we going to do this? Well, we're going to look at a
[14:34.920 --> 14:42.440]  couple of things. Uh, we're going to track a small sample size of the, uh, recent tweets in the
[14:42.440 --> 14:47.680]  timeline and an ongoing sample, of course, as we add that programmatically. And we're going to be
[14:47.680 --> 14:55.600]  doing our analysis on, uh, word frequency, word choice, uh, retweet frequency, um, and hashtag usage.
[14:55.600 --> 15:01.220]  That's actually, uh, of course, as we know, it's pretty important. Okay. All right. So let's take a
[15:01.220 --> 15:06.420]  look at the code. But before we do a couple of notes here, um, and you can play the game at home
[15:06.420 --> 15:11.100]  actually. So now by the time that you're watching this, um, this should be available and public for
[15:11.100 --> 15:15.220]  you and your consumption. So you can follow along with what I'm doing here, uh, in the next few
[15:15.220 --> 15:22.460]  slides. So, uh, I'll go ahead and give you a minute to, to take that, uh, link down. Um, and we'll go
[15:22.460 --> 15:27.920]  from there. Actually, there's a couple more, uh, points here. So, uh, it's just basically my GitHub
[15:27.920 --> 15:34.620]  handle and, uh, you know, the, the name of the project, Twitter word frequency. Um, now we scraped, I
[15:34.620 --> 15:41.240]  scraped, um, this, uh, information with Python and it's all Python. All of this is Python. Um, but I
[15:41.240 --> 15:48.700]  scraped with, uh, an actual Python file. Um, and then I use a Jupyter notebook for more, um, analysis.
[15:48.700 --> 15:53.800]  And there's a reason for that as well. And we'll see that in the next couple of slides. Uh, in the
[15:53.800 --> 15:59.320]  background here, you'll see your code snippet of the actual script, uh, which pulls the timeline.
[15:59.320 --> 16:04.320]  Um, now there's a couple of things here. It either, um, checks to see if the timeline has already been
[16:04.320 --> 16:09.400]  saved on your machine. And if it hasn't been saved on your machine, it then goes out, uh, using the
[16:09.400 --> 16:16.440]  Twitter API, um, to grab, uh, the latest, uh, timeline, uh, of your subject or of your target. Okay.
[16:16.560 --> 16:21.600]  Now in the foreground here, you're going to be seeing, um, uh, a snippet from the Jupyter notebook.
[16:21.720 --> 16:27.660]  And I used the Jupyter notebook because, um, maybe I wanted to run a little snippet of the code
[16:27.660 --> 16:33.560]  without running the entire script. So a lot of the code is going to be the same. Um, but with, uh,
[16:33.560 --> 16:37.400]  with Jupyter notebook, you know, and I'm sure everybody who's in data science already remember
[16:37.400 --> 16:43.380]  I'm not, um, you, you already understand that you can take these pieces of, uh, or snippets of code
[16:43.380 --> 16:49.200]  and just run those individually and change things as new data becomes available to you.
[16:50.020 --> 16:55.100]  All right. So let's talk now about the analysis. Okay. So this first analysis is actually,
[16:55.100 --> 17:02.680]  um, myself. This is my own, um, analysis on, on my handle at Chenbox. Um, so on the left side,
[17:02.680 --> 17:07.260]  uh, you'll see a couple of the words here that I've used on a regular basis, of course,
[17:07.260 --> 17:13.940]  with, with the most current recent events, you see mask is used there six times, uh, Twitter.
[17:13.940 --> 17:19.960]  Okay. Cause I've, I've had opinions about Twitter on Twitter. Okay. So that's, that's there. Um,
[17:19.960 --> 17:25.340]  now you'll notice that these red blotched out, uh, pieces are actually, um, these are Twitter
[17:25.340 --> 17:30.660]  accounts. Okay. Um, and, and how often I've referenced these Twitter accounts. Uh, so this
[17:30.660 --> 17:36.540]  can be a way that we can, again, we're talking about at a glance, um, we can see, um, who we're
[17:36.540 --> 17:43.540]  looking at, um, who, uh, we're associating with, uh, quite quickly and right, right up front.
[17:43.540 --> 17:50.460]  Okay. Um, on the, in the middle, on the, on the top side, you'll see, um, I have a red dot right
[17:50.460 --> 17:57.740]  next to RT, RT, medium course retweet. And you'll see that my retweet frequency is about 54 times
[17:57.740 --> 18:02.760]  in a 200 tweets sample. That's what that means. Remember this is out of, out of 200 tweets, um,
[18:02.760 --> 18:09.160]  200, uh, tweets, uh, 54 of those were retweets of other people. And of course, uh, all these
[18:09.160 --> 18:13.980]  Twitter accounts, um, that are blotched out. Um, I think it would be safe to assume that if I'm
[18:13.980 --> 18:19.180]  not mentioning somebody, um, I'm definitely retweeting them. Um, at the bottom, you'll see,
[18:19.180 --> 18:25.020]  uh, 2638, which is the word count associated with the last 200 tweets sample of my own personal
[18:25.020 --> 18:32.740]  account. Uh, I'm using mine as a base, as a baseline. And on the right side, um, you'll see,
[18:32.740 --> 18:38.720]  uh, hashtags and of course it's frequency. And at the bottom is the, um, association with the
[18:38.720 --> 18:44.960]  histograms, right? So it's a histogram, uh, associated with, uh, with the hashtags, uh,
[18:44.960 --> 18:53.060]  there at the top. And there's, there's my handle. So you know that this particular, uh, slide was
[18:53.060 --> 19:00.320]  for, for my, uh, own profile analysis. Uh, okay. So, um, remember that person that I said got kind
[19:00.320 --> 19:06.540]  of angry with my vague tweet? Well, this is, uh, this is the analysis slide of, of his account.
[19:06.540 --> 19:10.520]  So, um, you'll see that there's a couple of differences obviously, um, because we're not
[19:10.520 --> 19:16.540]  exactly the same. Um, on the left side, we'll see, uh, a couple, he's, he's mentioning a couple of
[19:16.540 --> 19:22.580]  people who, um, are blotched out, um, not as frequently as maybe I do, but you know, these,
[19:22.580 --> 19:30.080]  these profiles are in, in the scan that I've seen. Uh, you'll see his retweet frequency is 178.
[19:30.080 --> 19:38.160]  So, so out of 200, out of a 200 tweet sample, we're talking about 178 of those, uh, being retweets,
[19:38.160 --> 19:46.420]  retweets. Um, and, um, I also hit the, uh, a red dot on the word I, because I was just kind of curious
[19:46.420 --> 19:53.740]  as to see, um, how often somebody is, um, is using I, uh, in a sentence. I'm sure that there's a
[19:53.740 --> 20:00.380]  psychological profile, uh, based off of, um, the use of that word, right? And at the bottom, we see,
[20:00.380 --> 20:06.520]  um, the word frequency, 37, uh, 88. I know that there's math that we can do on, uh, this word pro,
[20:06.520 --> 20:13.300]  or on this, um, on this word count, uh, as far as, uh, calculating, you know, the types of words that
[20:13.300 --> 20:18.300]  we're using, um, how big they are, how small they are, et cetera. And, and another interesting thing
[20:18.300 --> 20:24.740]  about, uh, this particular analysis is, uh, the histogram on the right side, um, indicates that, um,
[20:24.740 --> 20:31.840]  this particular individual only, uh, tweet, or only uses a hashtag, uh, once, like, uh, one and done.
[20:31.840 --> 20:39.140]  So, um, I found that to be interesting. Uh, again, uh, this is for the, uh, at my target. Um, this is not
[20:39.700 --> 20:45.640]  the actual Twitter account, obviously, um, but I, I thought I had to identify it somehow.
[20:46.240 --> 20:51.860]  Now, lastly, I'd like to, uh, you know, I used myself as a baseline. Uh, this is the target's,
[20:51.860 --> 21:00.200]  um, dashboard analysis. And, uh, this next one is, uh, actually, uh, President Donald Trump. And we'll
[21:00.200 --> 21:06.980]  see, of course, a couple of, uh, of, of key features here. Uh, on the left, we don't see
[21:06.980 --> 21:13.240]  the word I, uh, actually we do, but it's no more than 20 times. Now, uh, for those of us who, who
[21:13.240 --> 21:18.260]  might think that, uh, President Trump is a little bit of a narcissist, uh, we might be kind of
[21:18.260 --> 21:25.800]  surprised, uh, to see that he's only tweeted the word I about 20 times, uh, in a, in a, uh, 200
[21:25.800 --> 21:30.460]  tweet sample. At the bottom there, we see 3,400 words. So these, these are all, you know, kind of
[21:30.460 --> 21:36.600]  interesting, the same stats. Um, he's retweeted only 96 times, um, in, in the past 200 tweet
[21:36.600 --> 21:42.620]  sample. Of course, uh, this piece is not so, um, surprising because we know that he uses a lot of
[21:42.620 --> 21:50.620]  his own, um, words, uh, in, in his tweets. Um, but, uh, back to the left side really quick. Um, the
[21:50.620 --> 21:55.580]  only blotched out account there, um, I don't know if anybody wants to take a quick guess at what that
[21:55.580 --> 22:01.740]  is. I know I can't, uh, wait necessarily for your answer as, as, uh, this is the closest that we can
[22:01.740 --> 22:09.780]  get to, um, audience participation at the moment. Um, but, uh, yes, if, uh, if you guessed that that
[22:09.780 --> 22:14.880]  was his own account, you are correct. So he's, what I'm saying is he's mentioned himself in his
[22:14.880 --> 22:21.560]  own tweets about 46 times. Uh, so that might be a, uh, an indicator of narcissism, maybe tweeting in
[22:21.560 --> 22:26.840]  the third person. We'll have to analyze further for that. Uh, and on the right side, we'll see,
[22:26.840 --> 22:33.460]  um, again, our histogram and, and hashtag matching. So, um, uh, President Trump has maybe tweeted a
[22:33.460 --> 22:38.620]  couple of things with hashtags, but of course the recurring one is MAGA. Is anybody surprised?
[22:38.620 --> 22:43.380]  I don't think anybody is. I'm not. Uh, and actually that would have been three if he didn't, uh,
[22:43.380 --> 22:50.640]  add emojis to that, uh, third MAGA there at the, at the bottom. Uh, so again, my point is, um, all
[22:50.640 --> 22:55.880]  of this was programmatically, programmatically, excuse me, um, all this was programmatically,
[22:55.880 --> 23:03.020]  um, scraped and, and put together so that we can at a glance, take a look at what, um, a target is
[23:03.020 --> 23:07.260]  talking about. Uh, and, and this is done, I understand that, you know, this could be easily
[23:07.260 --> 23:14.840]  done through APIs and, uh, other research. Uh, this is my own, um, basic whiteboard. This is my
[23:14.840 --> 23:20.940]  own starting from scratch, um, analysis of these targets. Myself, uh, President Trump because of
[23:20.940 --> 23:27.720]  the sports, uh, betting. And of course, uh, somebody who got really mad at me for, um,
[23:27.720 --> 23:34.380]  something that was really vague. Uh, okay. So now that that's, uh, an overview of the data
[23:34.380 --> 23:40.500]  really quickly, let's take a look at the insights here. Uh, so some of the insights, um, it seems
[23:40.500 --> 23:46.060]  like, of course, um, retweets indicate shared sentiment. Well, duh, but the question is, uh,
[23:46.060 --> 23:51.620]  from who, who are we sharing a sentiment with? Uh, this might give us insights, of course, into,
[23:51.620 --> 23:58.420]  um, who a person is associating with or how frequently or, um, how, how closely related
[23:58.420 --> 24:04.240]  these ideals may be that they share, depending on the retweet, of course. Um, and of course,
[24:04.240 --> 24:12.920]  we see narcissism in hashtags. So, uh, you may see, uh, a lot of, um, a lot of your, your own,
[24:12.920 --> 24:19.020]  actually, okay, so there's a couple of things here. Um, I did forget to, uh, include, um,
[24:19.020 --> 24:25.840]  uh, a screenshot that I have of my target, uh, tweeting himself, like, his own handle, um,
[24:25.840 --> 24:32.580]  in, uh, in the hashtags. So, it's hashtag his own handle, um, and I, I found that to be very
[24:32.940 --> 24:37.500]  narcissistic, um, kind of interesting, um, but that's something that I wanted to, uh, piece in
[24:37.500 --> 24:41.220]  there. So, I do apologize for not having that screen cap in there, uh, but I found that to be
[24:41.220 --> 24:47.540]  interesting, and that's why I put that in, in, in here in the insights. Um, and, uh, repeating
[24:47.540 --> 24:51.640]  hashtags to me, of course, again, this is, this is kind of obvious when it comes to somebody who's
[24:51.760 --> 24:57.580]  a regular Twitter user. Um, it depicts brand or focus, you know, and of course, we see that with,
[24:57.580 --> 25:04.640]  with, with repeating hashtags. Um, now, I've noticed that word frequency, um, the word
[25:04.640 --> 25:10.760]  frequency count significantly drops off after stop words. Um, so, you know, I have a rudimentary,
[25:10.760 --> 25:15.140]  uh, introduction to, uh, natural language processing, so I understand a little bit about
[25:15.140 --> 25:22.500]  tokenization and, um, what stop words are. Stop words being, like, uh, in the very small, um, words
[25:22.500 --> 25:28.080]  that kind of glue the English letter, uh, language together, right? Um, and so, again, after, after
[25:28.080 --> 25:33.640]  those words are scraped off the top, the, the frequency of those words, of other words, really
[25:33.640 --> 25:41.280]  drop. And, of course, uh, for those who are linguists, may already know that. Um, now, um,
[25:41.280 --> 25:45.240]  obviously, with what we've, uh, with what I've presented today, we know that descriptive
[25:45.240 --> 25:52.160]  statistics are as far as we can get, um, without any sort of, uh, further, um, analysis with better
[25:52.160 --> 25:58.260]  tools. Better tools being, of course, uh, neurolinguistic, uh, natural language processing,
[25:58.260 --> 26:04.540]  maybe some other, uh, types of, uh, data science tools and, and, uh, techniques. And so, that is
[26:04.540 --> 26:11.500]  where this work in progress is going to be heading next. Okay, and so, speaking of which, uh, where
[26:11.500 --> 26:16.420]  do we go from here? Okay, well, uh, of course, I want to clean up the data collection, data
[26:16.420 --> 26:20.560]  collection methods right now. It is pretty rough, and I'd like to turn that into something that is,
[26:20.560 --> 26:27.760]  uh, uh, easier to read and, and followed very, uh, very easily. Now, better data visualization.
[26:27.800 --> 26:32.720]  Um, I quickly grabbed screenshots together, um, but, of course, I'd like to incorporate word
[26:32.720 --> 26:38.980]  clouds and other charting, maybe better charting, right? Um, dashboard of data. I'd like to clean
[26:38.980 --> 26:47.000]  that up so that, again, the, the whole goal is an at-a-glance, uh, for, um, for a Twitter analysis.
[26:47.180 --> 26:52.080]  Okay, uh, now, what can we do with this? Well, we could talk about, uh, profile scoring. So, for
[26:52.080 --> 26:58.240]  instance, um, leanings, whether it's a sentimental leaning, a, uh, political leaning, a corporate
[26:58.240 --> 27:05.180]  leaning, uh, I'd like to see if we can, uh, we can understand that, um, at, at a quick look. Um,
[27:05.180 --> 27:11.740]  the potential for extremism, um, if, if somebody is getting mad at you for, uh, a tweet that was
[27:11.740 --> 27:19.780]  vague enough for them to project, um, meaning onto it, um, then, then what is the potential for
[27:19.780 --> 27:23.320]  somebody to go kind of a little bit more off the rails? That's something else that I'd like to look
[27:23.320 --> 27:29.760]  at. Um, and, of course, um, ongoing, uh, me being a psychology buff, um, you know, ongoing psychological
[27:29.760 --> 27:34.640]  profiles and red flag notifications. If there's a red flag with anything that we see here in, in
[27:34.640 --> 27:40.200]  their timeline, um, can we detect that? And, of course, um, I am, I'm almost certain that there's
[27:40.200 --> 27:45.360]  already research out there. I'd like to continue, uh, looking at that myself and, and, and building
[27:45.360 --> 27:52.240]  from there. So, features to incorporate. I've said it a lot already. Um, natural language processing.
[27:52.240 --> 27:57.580]  I think that'll be a very, uh, a very good thing to add in here. Uh, sentiment analysis, also part of,
[27:57.580 --> 28:04.820]  part of that space. Uh, a mirror bot. Uh, real quickly, what I want to talk about there is, is, um,
[28:04.820 --> 28:12.300]  if we know how our target speaks, can we make a bot that kind of talks to themselves? I think that
[28:12.300 --> 28:17.420]  would be kind of funny to watch. Um, hopefully it doesn't get anybody banned, but I think it'll be,
[28:17.420 --> 28:22.540]  again, it'll be funny to see, just as a funny project. This is all just, it's just jokes, folks.
[28:22.540 --> 28:30.320]  It's just jokes. Um, now, uh, I want to build off of the prior research that, uh, I, myself and, and
[28:30.320 --> 28:35.320]  the community has done, uh, including my, my stalker research and, and anti-stalking research
[28:35.320 --> 28:39.620]  and anything else that the community has done, I'd like to incorporate in what I'm doing now.
[28:40.020 --> 28:45.440]  Um, and even though we've, even though today I've only shown, uh, three profiles, I'd like to do a
[28:45.440 --> 28:51.840]  quick, uh, mass script, a mass scripted analysis of maybe everybody that is following me or
[28:51.840 --> 28:58.300]  everybody that I'm following, uh, and just see, um, what kind of results, um, are in their timeline.
[28:58.300 --> 29:03.440]  Remember, these are public timelines. And of course, um, I don't want to hit any rate limits,
[29:03.440 --> 29:07.980]  so we'll, we'll do it slowly. Uh, some, some future questions that are on my mind when it
[29:07.980 --> 29:14.780]  comes to, uh, this is, uh, can I get a target to talk to themselves, uh, and agree or disagree?
[29:14.780 --> 29:20.540]  I think that'd be funny. That's, that's what I was mentioning with the, the mirror bot. Um, can a
[29:20.540 --> 29:25.420]  person have a conversation with themselves and disagree on their own maybe extremist points or
[29:25.420 --> 29:31.160]  non-extremist points, you know? Uh, next question is, uh, you know, can outrage be predicted? You
[29:31.160 --> 29:36.020]  know, that's something that I, that's something that I've brought up, um, uh, prior or earlier in,
[29:36.020 --> 29:40.640]  in this talk as well. Um, could that be predicted based off of what is in the current events in the
[29:40.640 --> 29:46.600]  news, uh, and what's going on globally, uh, at the moment? Uh, next one. Of course, uh, any
[29:46.600 --> 29:50.580]  questions that, uh, the audience may have for me, I would definitely want to put that into this
[29:50.580 --> 29:59.180]  project. So, uh, to summarize, who said nothing came, uh, nothing good came out of sports betting?
[29:59.420 --> 30:04.360]  I think, uh, I think there's, uh, some benefit here. I'm glad that I went down this rabbit hole.
[30:05.360 --> 30:12.300]  Uh, now sharing vague sentiments can, can still make frenemies. Uh, you know, apparently I've,
[30:12.300 --> 30:18.520]  I was not expecting that even with, uh, today's climate, but hey, it is what it is. Um, now
[30:18.520 --> 30:23.640]  remember, don't get mad. Uh, you know, don't get mad. Don't feed the outrage. Don't get,
[30:23.640 --> 30:28.800]  don't, you know, don't send the rage back. Just get productive. Take what you have and
[30:28.800 --> 30:33.200]  turn it into research. That's been, that's been my guiding light for the past five years,
[30:33.200 --> 30:38.760]  maybe longer. Um, start somewhere. Like I said, this project is definitely a work in progress,
[30:38.760 --> 30:43.960]  um, but I'm going to start somewhere. And of course we built, uh, okay. So fork me,
[30:45.200 --> 30:49.760]  there's the link again, the GitHub link. If you didn't get it before, um, Twitter word frequency
[30:50.280 --> 30:58.740]  and, Oh, sorry. I'll give you a couple of seconds on that one. A couple more and resources. So the
[30:58.740 --> 31:06.040]  top link is actually, uh, another, uh, uh, it's college speech on, um, natural language processing
[31:06.040 --> 31:12.700]  and the, and how that goes. Um, and then of course, um, Oh, these next two, uh, these next
[31:12.700 --> 31:16.740]  two weeks are very good. I'll let you check that one out. Uh, because, uh, those are actually,
[31:16.740 --> 31:21.980]  uh, kind of in line with what I'm trying to do, but on a massive scale for myself,
[31:21.980 --> 31:26.540]  like I'm trying to do this on a massive scale. Uh, what these last two weeks are doing specifically
[31:26.540 --> 31:37.480]  for Donald Trump's, uh, tweet account. Uh, thank you very much. And that concludes my talk.
